Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a docling service file conversion feature #455

Merged
merged 2 commits into from
Jan 22, 2025

Conversation

nerdalert
Copy link
Member

  • Integrates an automated file conversion via docling-serve to the native knowledge submission component.
  • Supports the docling conversion file types of PDF, DOCX, PPTX, XLSX, Images, HTML and AsciiDoc to be used in the next step of context selection (see demo vid below).
  • There is a health check of docling-serve at file upload selection, if the service is unavailable, only md files are accepted.

This depends on #439. That PRs changes are in this PR to spare myself another rebase. Will leave this in draft until 439 merges.

Demo .pptx conversion:

pptx-ui-docling-service-demo.mp4

Demo Image conversion:

image-ocr-ui-docling-service-demo.mp4

@Misjohns
Copy link
Collaborator

@nerdalert
Does each file get converted into a MD file or do all the files get converted into a single large MD file? My concern is if a user uploads multiple files and we create a single MD for that, it would be more difficult to locate the context in that file. I believe it would be a better UX to create at MD for each uploaded file for the context select step. If needed, we could then merge the MDs together into a single file at submit.

@vishnoianil
Copy link
Member

@nerdalert this PR requires rebase.

@nerdalert nerdalert force-pushed the file-conversion-service branch from a49e78b to 3ab911c Compare January 21, 2025 20:02
- Integrates automative file conversion via docling-serve to the
  native knowledge submission component.
- Supports the docling conversions PDF, DOCX, PPTX, XLSX, Images,
  HTML and AsciiDoc to be used in the next step of context selection.

Signed-off-by: Brent Salisbury <[email protected]>
@nerdalert nerdalert force-pushed the file-conversion-service branch from 3ab911c to f9df911 Compare January 21, 2025 20:18
Copy link
Member

@vishnoianil vishnoianil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vishnoianil vishnoianil merged commit 5eb64a3 into instructlab:main Jan 22, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants